Particulate matter is the presence of small particles of things like dust, dirt, and chemicals in the air. When inhaled, it can cause serious health issues.
There are 2 kinds of PM, including:
There are national standards for how much PM10 and PM2.5 can be in the air. For PM2.5, the annual mean over three years cannot exceed 12 micrograms per cubic meter.
The following explores data about average PM2.5 levels in the United States from 2008-2010.
For more information, visit United States Environmental Protection Agency.
Each row contains:
Rows: 571
Columns: 5
$ pm25 <dbl> 10.827805, 11.583928, 11.261996, 9.414423, 11.391494, 12.384…
$ fips <dbl> 1069, 1073, 1089, 1097, 1103, 1113, 1117, 1121, 1125, 1127, …
$ region <chr> "east", "east", "east", "east", "east", "east", "east", "eas…
$ longitude <dbl> -85.35039, -86.82805, -86.58823, -88.13967, -86.91892, -85.1…
$ latitude <dbl> 31.18973, 33.52787, 34.73079, 30.72226, 34.50702, 32.37600, …
As shown in Figure 1, the distribution for average PM2.5 levels is approximately symmetric with a median of about 10, Q1 about 9, and Q3 about 11. The lower whisker starts around 5 and the upper whisker starts around 15.
There are multiple outliers both on both side of the distribution. It is hard to tell exactly how many are present because some of them are overlapping.
The graph appears to be skewed left with some outliers on the upper end. The mode is somewhere between 9.5 and 10.5. Most of the data is close to but under the cutoff point.
The current air quality standard is 12 micrograms per cubic meter, but it used to be 15. All of the counties that exceed the air quality standard of 15 are located in California, which is in the western region of the US.
All of these values appear to be outliers according to Figure 1.
# A tibble: 8 × 3
pm25 fips region
<dbl> <dbl> <chr>
1 16.2 6019 west
2 15.8 6029 west
3 18.4 6031 west
4 16.7 6037 west
5 15.0 6047 west
6 17.4 6065 west
7 16.3 6099 west
8 16.2 6107 west
Counties in the east have a smaller spread of data. Eastern counties generally have higher levels of PM2.5 based on the boxplot’s center while western counties generally have lower levels. However, some eastern counties are outliers and have lower values more similar to western counties. Also, some western counties are outliers that have levels even beyond the maximum level of eastern counties.
The eastern distribution appears symmetric. The western distribution may also be symmetric or slightly skewed right. The east median is around 10.5 while the west median is around 7.5.
The violin plot shows us that the eastern distribution is quite symmetric, and values of approximately 10.5 are most common (right at the media). The western violin plot makes it more apparent that the data is likely skewed right, and the highest peak in the distribution is slightly lower than the median around 7.
Both histograms are approximately symmetric. There are more eastern counties than western counties, and the eastern counties have a high peak around 10 micrograms per cubic meter. The most values for the western counties are between 6.5 and 7.5. Most of the values above the cutoff line for the western counties appear to be outliers, while there are more above the cutoff line for eastern counties.
Figure 6 and Figure 7: The western counties do not appear to have a clear relationship between latitude and PM2.5 levels, but the eastern counties may. However, this relationship is nonlinear.
Figure 8: The relationships between all the variables are generally weak. It is slightly more positive between PM2.5 and longitude, and slightly more negative between PM2.5 and latitude, and latitude and longitude. ```
---
title: "Particulate Matter (PM) Pollution"
output:
flexdashboard::flex_dashboard:
theme:
version: 4
bootswatch: pulse
navbar-bg: "red"
orientation: columns
vertical_layout: fill
source_code: embed
---
```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(DT)
library(plotly)
library(ggplot2)
library(vioplot)
library(corrgram)
data<-read_csv("./avgpm25.csv")
attach(data)
```
Overview
===
Column {data-width=500}
----
### **What is Particulate Matter?**
Particulate matter is the presence of small particles of things like dust, dirt, and chemicals in the air. When inhaled, it can cause serious health issues.
There are 2 kinds of PM, including:
- <span Style="color:green">PM10:</span> "inhalable particles, with diameters that are generally 10 micrometers and smaller" [(EPA)](https://www.epa.gov/pm-pollution/particulate-matter-pm-basics#PM).
- <span Style="color:green">PM2.5:</span> "fine inhalable particles, with diameters that are generally 2.5 micrometers and smaller" [(EPA)](https://www.epa.gov/pm-pollution/particulate-matter-pm-basics#PM).
### **PM Standards**
There are national standards for how much PM10 and PM2.5 can be in the air. For PM2.5, the annual mean over three years cannot exceed 12 micrograms per cubic meter.
Column {data-width=500}
---
The following explores data about average PM2.5 levels in the United States from 2008-2010.
For more information, visit [United States Environmental Protection Agency](https://www.epa.gov/pm-pollution).
Data
===
Column {data-width=500}
---
### Data Table
```{r}
datatable(data[1:500,], rownames=FALSE, colnames=c("PM2.5","fips","Region","Longitude","Latitude"), options=list(pageLength=20))
```
Column {data-width=500}
---
### Variables
Each row contains:
- [a five-digit code indicating the county (fips)](https://transition.fcc.gov/oet/info/maps/census/fips/fips.txt#:~:text=FIPS%20codes%20are%20numbers%20which,to%20which%20the%20county%20belongs.)
- the region of the country in which the county resides
- the logitude of the centroid for that county
- the latitude of the centroid for that county
- the average PM2.5 level
```{r}
glimpse(data)
```
National Averages
===
Column {.tabset data-width=600}
---
### Figure 1
```{r q1 boxplot}
boxplot(data$pm25,main="Average PM2.5 Levels", xlab="Average PM2.5 Level", ylab="Micrograms per Cubic Meter", col="lightgreen", ylim=c(0,20))
```
### Figure 2
```{r}
cutoff<-"12 micrograms per cubic meter"
ggplot(data, aes(x=pm25))+geom_histogram(color="black",fill="lightgreen",binwidth = 1)+geom_vline(aes(xintercept=12))+geom_text(data=data,size=3, aes(x=12, label=cutoff, y=125))+theme(text=element_text(size=10))+labs(title="Average PM2.5 Levels Cutoff Point", y="Counties",x="micrograms per cubic meter")
```
Column {.tabset data-width=400}
---
### F1 Analysis
As shown in Figure 1, the distribution for average PM2.5 levels is approximately symmetric with a median of about 10, Q1 about 9, and Q3 about 11. The lower whisker starts around 5 and the upper whisker starts around 15.
There are multiple outliers both on both side of the distribution. It is hard to tell exactly how many are present because some of them are overlapping.
### F2 Analysis
The graph appears to be skewed left with some outliers on the upper end. The mode is somewhere between 9.5 and 10.5. Most of the data is close to but under the cutoff point.
### Air Quality Standards
The current air quality standard is 12 micrograms per cubic meter, but it used to be 15. All of the counties that exceed the air quality standard of 15 are located in California, which is in the western region of the US.
All of these values appear to be outliers according to Figure 1.
```{r exceeding 15}
cond<-data$pm25>15
data[cond,1:3]
```
Regional Differences
===
Column {.tabset data-width=700}
---
### Figure 3
```{r}
boxplot(pm25~region, data=data,main="Average PM2.5 Levels by Region", ylab="Average PM2.5 Level (micrograms per cubic meter)", xlab="Region",names=c("East","West"), col=c("blue","red"))
```
### Figure 4
```{r}
vioplot(pm25~region,main="Average PM2.5 Levels by Region", xlab="Region", ylab="Average PM2.5 Level (micrograms per cubic meter)", col=c("blue","red"),names=c("East","West"))
```
### Figure 5
```{r}
ggplot(data, aes(x=pm25))+geom_histogram(color="black",fill="lightgreen",binwidth = 1)+geom_vline(aes(xintercept=12))+geom_text(data=data,size=3, aes(x=12, label=cutoff, y=125))+theme(text=element_text(size=10))+labs(title="Average PM2.5 Levels Cutoff Point by Region", y="Counties",x="micrograms per cubic meter")+facet_wrap(~region)
```
Column {.tabset data-width=300}
---
### F3 Analysis
Counties in the east have a smaller spread of data. Eastern counties generally have higher levels of PM2.5 based on the boxplot's center while western counties generally have lower levels. However, some eastern counties are outliers and have lower values more similar to western counties. Also, some western counties are outliers that have levels even beyond the maximum level of eastern counties.
The eastern distribution appears symmetric. The western distribution may also be symmetric or slightly skewed right. The east median is around 10.5 while the west median is around 7.5.
### F4 Analysis
The violin plot shows us that the eastern distribution is quite symmetric, and values of approximately 10.5 are most common (right at the media). The western violin plot makes it more apparent that the data is likely skewed right, and the highest peak in the distribution is slightly lower than the median around 7.
### F5 Analysis
Both histograms are approximately symmetric. There are more eastern counties than western counties, and the eastern counties have a high peak around 10 micrograms per cubic meter. The most values for the western counties are between 6.5 and 7.5. Most of the values above the cutoff line for the western counties appear to be outliers, while there are more above the cutoff line for eastern counties.
Longitude/Latitude {data-orientation=rows}
===
Row {data-height=500}
---
### Figure 6
```{r}
ggplot(data, aes(x=latitude,y=pm25))+geom_point(aes(color=region))+scale_color_brewer(palette = "Set1")+labs(title="PM2.5 Levels by Latitude",x="Latitude (degrees)",y="Average PM2.5 Level (micrograms per cubic meter)")
```
### Figure 7
```{r}
ggplot(data, aes(x=latitude,y=pm25))+geom_point(aes(color=region))+scale_color_brewer(palette = "Set1")+facet_wrap(~region)+labs(title="PM2.5 Levels by Latitude",x="Latitude (degrees)",y="Average PM2.5 Level (micrograms per cubic meter)")
```
Row {data-height=500}
---
### Figure 8
```{r}
vars<-c("pm25","latitude","longitude")
corrgram(data[,vars], order=T, lower.panel=panel.shade, upper.panel=panel.pie, main="Relationships Between PM2.5 Levels, Latitude, and Longitude")
```
### Analysis
**Figure 6 and Figure 7:** The western counties do not appear to have a clear relationship between latitude and PM2.5 levels, but the eastern counties may. However, this relationship is nonlinear.
**Figure 8:** The relationships between all the variables are generally weak. It is slightly more positive between PM2.5 and longitude, and slightly more negative between PM2.5 and latitude, and latitude and longitude.
```